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ACOUSTIC ECHO CANCELLATION SYSTEM 

FIELD OF INVENTION 

This invention relates to multi-channel acoustic echo cancellation system and 
more particularly to a cancellation system using time-varying all-pass filtering for signal 
decorrelation. 

BACKGROUND OF INVENTION 

At present, most teleconferencing systems use a single full-duplex audio channel 
for voice communications. These systems also make use of an acoustic echo canceller to 
reduce the undesired echo resulting from the coupling between the loudspeaker and the 
microphone. To make these systems more lifelike, better and more realistic sound 
systems are required. High fidelity wide bandwidth (100 to 7000 Hz) voice 
communication system is now being used. However, in order to introduce spatial 
realism, more than one channel is needed. Therefore, future teleconferencing systems are 
expected to have more than one channel (at least stereo with two channels) of fufl duplex 
voice communications. 

One of the fundamental problems in stereophonic acoustic echo cancellation 
(AEC) systems is that given the input to the loudspeakers and the output of the 
microphones in the receiving room, the echo path cannot be determined uniquely. See for 
example the following references: J. Benesty, D. R. Morgan and M. M. Sondhi, "A 
Better Understanding and an Improved Solution to the Problems- of Stereophonic 
Acoustic Echo Cancellation," Preprint, Proceedings ofICASSP-97, Vol. I, pp. 303-306, 
Munich, Germany, April 21-24, 1997; J. Benesty, P. Duhamel and Y. Grenier, "Multi- 
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Channel Adaptive Filtering Applied to Multi-Channel' Acoustic Echo Cancellation/' 
Preprint, Submitted to IEEE Trans, on Signal Processing, April 1995; S. Shimauchi and 
S. Makino, "Stereo Projection Echo Canceller with True Echo Path Estimation," 
Proceedings ofICASSP-95, pp. 3059-3062, 1995; and M. M. Sondhi, D. R. Morgan and 
J. L. Hall, "Stereophonic Acoustic Echo Cancellation - An Overview of the Fundamental 
Problem/' IEEE Signal Processing Letters, Vol. 2, No. 8, pp. 148-151, August 1995. The 
problem is due to the correlation between the stereo signals. As a result, any adaptive 
technique used in stereophonic AEC systems fails to identify the echo path responses 
correctly. To circumvent this problem, it is necessary to develop techniques to 
decorrelate the stereo signals at the input to the loudspeakers without affecti-ng stereo 
perception. 

Several techniques have been proposed in the past, e.g., addition of random noise, 
modulation of signal, decorrelation, filters, inter-channel frequency shifting etc. 
However, these techniques either do not correlate the signals or destroy stereo perception 
completely. The interleaving comb filtering proposed in Sondhi et al. cited above only 
gives partial identification (above 1 kHz) of the echo path responses. Recently, a 
technique is proposed in Benesty et al. cited above based on non-linear processing of the 
stereo signals. However, as noted by the authors of Benesty et al., for tonal signal, the 
technique based on non-linearity cannot maintain transparency in perception (changes the 
pitch perception). 
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SUMMARY OF INVENTION 

In accordance with one embodiment of the present invention, a multi-channei 
acoustic cancellation system includes time-varying all-pass filtering in signal paths to 
provide decorrelation of signals. 
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IN THE DRAWINGS 

Fig. 1 illustrates a prior art stereophonic echo cancellation system; 
Fig. 2 illustrates a stereophonic echo cancellation system according to the present 
invention; 

Fig. 3 is a plot of time delay vs. frequency for all-pass filters with a iMn =-0.9 
and cc LmaK = 0 in Fig. 2; and 

Fig. 4 illustrates behavior of misalignment with original signal without the all- 
pass filtered input and according to the present invention with the all-pass filtered input. 
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DESCRIPTION OF PREFERRED EMBODIMENT OF PRESENT INVENTION 



Fig. 1 shows the configuration of a typical stereophonic echo cancellation system 
10. The transmission room (depicted on the left) 1 1 has two microphones 13 and 15 that 
pick up the speech signal, „t, via the two acoustic paths characterized by the impulse 
responses, g\ and g 2 . All acoustic paths are assumed to include the microphone and/or 
loudspeaker responses. The microphone output is then given by (in the frequency 
domain) 

X t {m) = G t {a»X{a». (1) 
In this application, the upper-case letters represent the Fourier transforms of the 
time-domain signals denoted by the corresponding lower-case letters. The whole system 
is considered as a discrete-time system ignoring any A/D or D/A converter. These signals 
are presented through the set of loudspeakers 27 and 29 in the receiving room 21 (on the 
right in Fig. 1). Each microphone 23 and 25 in the receiving room picks up an echo (y l9 
y2 in Fig. 1) from each of the loudspeakers. Let hy be the acoustic path impulse response 
from the loudspeaker to the microphone. In Fig. 1 the path from speaker 27 to 
microphone 23 is h Ih the path from speaker 27 to microphone 25 is h 2 u the path from 
speaker 29 to microphone 25 is h 22 and from speaker 29 to microphone 23 is h i2 . Then 
the echoes (y t , y 2 ) picked up by the microphones 23 and 25 in the receiving room 21 are 
given by (in the frequency domain) 

Y^^H^mXjdD) (2) 

In the absence of any AEC, the echoes v,'s will be passed back to the loudspeaker 
17, 19 in the transmission room 11 and will be recirculated again and again. This will 
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cause multiple echoes or may even result in howling instability. Commonly used AEC 
systems use adaptive finite impulse response (FIR) filters that provide estimates of the 
echo path responses. The FIR filter coefficients are updated adaptively depending on the 
input signals to the loudspeaker and the outputs of the microphones. 

In the stereophonic AEC, there are four echo paths {h ih h I2 „ h 2Ij and h 22 ) to be 
identified. We, therefore, need four adaptive filters 3 1-34 as shown in Fig. 1 . Filter 3 1 is 
the estimate for the h n path, filter 32 is the estimate for the A/ 2 ,path s filter 33 is the 
estimate for the h 2I path and filter 34 is the estimate for the h 22 path. The estimates 

A A A A 

h u and h l2 of paths to echo y { are summed at 37 and the estimates h 2l and h 22 of paths 

to echo y 2 are summed at 38. The output of the AEC filters (which can be thought of as 
an estimated echo) are as follows 

j 

These estimated echoes at 37 and 38 are subtracted at adders 35 and 36 from the 
true echoes from yi and y 2 giving the error signals (c { and e 2 in Fig. 1), 

E^^Y^-Y^co) 

These error signals are used to update the filter 31-34 coefficients (represented by 
feedback lines 41,42). Several techniques are available to calculate the filter updates 
(e.g., the least means square (LMS), the recursive least square (RLS), the affine 
projection (AP) algorithms, etc.). All these techniques attempt to minimize these error 
signals in one way or another. 

The data available to the echo canceller are the inputs to the loudspeakers, jc/'s, as 
well as the outputs of the microphones, y/s, in the receiving room 21. The fundamental 
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problem of stereophonic AEC systems is that given this set of data, it is not possible to 
uniquely determine the echo paths to drive the error, e/s to zero (i.e., to eliminate the 
echoes). In order to explain this, let us look at the error in one of the channels (similar 
analysis can be carried out for the other channels). In the frequency domain, this error is 
given by 

E x ico) = X i, (*>) -H u (a»)Gj {co)X(co) 

j 

Let us assume that somehow, we have been able to achieve perfect echo 
cancellation, i.e., we have E l (ct)) = Q. Assuming that X(co)docs not have zeros in the 
frequencies of interest, the above gives 

Y,{H X} {cq)-Hm{q)))G } {cq) = 0 (3) 
j 

A 

This equation does not imply H XJ (co) = H\ } (co) . Therefore, even if the echo has 

been driven to zero, we have not necessarily achieved perfect alignment. In other words, 
the canceller has not necessarily identified the true echo path. In fact, the above equation 

A 

has infinitely many solutions for H\ } (co). Any adaptation algorithm may lead to any one 
of these solutions. Note that so long as the conditions in both the transmitting and the 
receiving rooms are fixed, this does not cause any problem as the echo will remain zero. 
However, the adaptation technique has to track not only the changes in the receiving 
room that change the echo path responses, hy, but also the changes in the conditions in the 
transmitting room as reflected through changes in g- t . Tracking the conditions in the 
transmitting room can be specially problematic as g,- may change abruptly and by a large 
amount (e.g., one speaker stops talking and another speaker starts speaking from a 
different location), 
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A detailed discussion of this problem describing several viewpoints can be found 
in above cited references. Specially, the discussion in Benesty et al. provides a better 
understanding of the above problem both in terms of non-uniqueness and misalignment of 
the solutions. 

As discussed above, the reason for non-perfect alignment is that the two signals 
are correlated. Correlation between stereo signals do not allow sufficient identification of 
the echo path responses. Thus, in order to solve the problem, we have to find a technique 
to decorrelate the input signals to the loudspeakers, jc,-, in such a way that it does not affect 
the stereo perception in the receiving room. 

The system 40 for the stereophonic echo cancellation system is shown in Fig. 2. 
Each of the stereo signals is passed through a different all-pass filter 45, 47 denoted by 
a t (n). The subscript n is used to indicate that the all-pass filter is time-varying (varying 
with ri). 

Rigorously speaking, there is no frequency domain representation of the time- 
varying filtering operation used in Fig. 2. However, if we assume that a { (n) does not 
change much for a given window around time instant n, then it is possible to assign a 
frequency domain transfer function A(o) 9 n) to the filtering operation at time instant n. 
Then the frequency spectra of the output at time instant n can be formally written as 

Y ( (fl>, n) = X # * W a j n ^ X J (<°> 
Y, (CD,n) = X H 9 (CD) A j (CD, n)X l (CD) 



TI-25262 -8- 



Then the error in the /" path is 

E, {(0, n) = 2 (H,j (a) - H tl Icq)) A, {co, n)G j {(0)X{(0) 

J 

Now, if we can achieve perfect echo cancellation by setting E i (CO,n) = 0, then the 
above implies 

X (H (j {co) - H (j {co)) A J {co, n)G } {co) X{co )=0 

j 

Since the above must be true for all n 9 i.e., for all variations of A } {co,n) with n y 

A 

we must have H tJ {co)~ H l} {Q)), Thus, by using the time varying all-pass filter in the 

signal path, it is possible to achieve perfect alignment between the adaptive filter and the 
true echo path. In practice, perfect alignment is not possible due to the finite impulse 
response of the modeling filters (the adaptive filters) as well as due to the noise present in 
the signal. However, simulations show that this technique achieves much better 
identification of the echo paths than was otherwise possible. 

The system 40 must follow certain constraints. First, the signals that are modified 

through the all-pass filters 45, 47 are played back through the loudspeaker in the 

receiving room 21. Therefore, the time-variation of the all-pass filters has to be chosen in 

such a way that does not alter the stereo perception of the speech. * Second, since an 

adaptive filter will be used to identify the echo path responses, the time-variation of the 

all-pass filters should be fast enough so that the adaptive technique used cannot track the 

changes in the all-pass filters. On the other hand, it is desirable that the adaptive 

technique be able to track changes in the receiving room 21. These conflicting 
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requirements show the importance of proper choice of the time-varying all-pass filters. In 
the following, we discuss one possible choice. 

The simplest all-pass filter is a single-order filter that can be described by a single 
parameter a t (n) . The frequency response of such a system for a giyen n can be written 
as 

A t (0>,n) = ^4- 

Such a filter has several important features, namely 

• \A i {Q),n)\- 1.0, \fa) and Vn, i.e., this filter passes all frequencies all the time 
unattenuated. 

• It only changes the phase of each frequency. 

• It is completely determined by a single time-varying parameter a t (n) . Thus, 
the design of the system involves proper choice of a } (n) . 

In order for the all-pass filter a x (n) to be stable, the absolute value of a { (n) must 
be less than unity. Since all our signal is real, we have also restricted a x (n) to be a real 
value. This also simplifies the filtering operation. a f (n) is a time-varying parameter. 
Thus, we need to update a t (n) at every time instant. The update rule for a { (n) is as 
follows 
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tf f (/: + !) = or, (rt) + r,(/t), 



set a. (n + 1) = a um if a. (* + 1) > or. max 

set a, (n + I) = a, min if or, (/i + 1) < a Lmn (4) 

Here. r.(«)is an independent and identically distributed (iid) random variable 

having a uniform probability distribution function (pdf) over the interval [-/?/,/?/]. /?/ 
indicates the maximum allowable deviation of from one instant to another. This 

deviation corresponds to phase jitter introduced by the time-varying all-pass filter for the 
i (U channel. should be made as large as possible to introduce enough signal 
decorrelation. However, too large a value of /?,• will result in noticeable change in speech 
perception. 

°v.max an d^.mi„ in equation (4), represent the maximum and minimum allowable 

values of a-{n) . In order to ensure stability, we must have a Uxm <\ and ^ mm >-l. 

Further restrictions are also required to maintain transparency in speech perception. 
These restrictions are derived from the data known as "just noticeable inter-aural delay" 
in psychoacoustics. A discussion of this is found in E. Zwicker and H. Fasti, 
Psychoacoustics: Facts an d Models, Heidelberg, Germany: Springer- Verlag, 1990. This 
data represents the minimum change in the inter-aural time delay between the two ears at 
a given frequency that causes a noticeable change in the perception of the direction of 
sound. The all-pass filter changes the phase of each frequency of the input speech. The 
effect of this phase change is to change the time arrival of the signal at each frequency in 
the ears. So, if we limit the phase changes so that the change in the time of arrival for 
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each channel is within the just noticeable inter-aural delay, then spatial perception of 
stereo signal will not be affected. The just noticeable inter-aural delay varies between 30 
(isec. To 200 jisec. We have chosen to limit the change in the time of arrival of each 
frequency within 60 (isec. This leads to the following values of # / nm and a, min , 

#/.™x = 0and 
<*>m* =-0-9 

Fig. 3 shows the time delay as function of frequency for the two all-pass filters 
with a i mm = -0.9 and a t m ^ = 0. Since the value of a i min for the all-pass filters in the 

two stereo paths are kept within these limits, the resulting inter-aural delay are also within 
60 fisec. Our experiments have shown that this choice leads to good signal decorrelation 
to allow correct identification of echo path responses and also keeps the stereo perception 
of speech unchanged. 

In order to evaluate the technique, we collected stereo speech samples in our 
audio laboratory. The audio laboratory was used as the transmitting room. We had two 
speakers talking alternately in the room when two microphones were used to collect the 
data. The data were sampled at 16 kHz sampling rate. In one set of data, the speakers 
were asked to stand still while talking. This was made to ensure that the echo path 
responses remain the same. In another set, they were free to move around the room as 
they talked into the microphones. We then used our technique to decorrelate the collected 
stereo signals. We performed informal listening tests by playing the original and the 
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modified stereo signals over both loudspeakers and headphones. All these tests show that 
the stereo perception of the modified signal is indistinguishable from that of the original. 

We simulated the receiving room loudspeaker outputs by convolving the stereo 
signals using the echo path responses h\ \ and /?i 2 . These two echo path responses were 
obtained using the image method of Allen et al. based on room measurements of one of 
our conference rooms. For more details on Allen, et al. see the following reference: J. 
Allen and D. Berkley, "Image Method for Efficiently Simulating Small-Room 
Acoustics," J. Acoust. Soc. Am., Vol. 65, No. 4, pp. 943-950, April 1979. The 
microphone output in the receiving room was simulated by summing up the outputs of 
these two convolutions. In the above convolutions, we restricted the lengths of the echo 

A 

path responses to be N = 4096 samples long. We then used the two adaptive filters h n 

A 

and A 12 each of length L = 2048 samples, to identify these echo path responses. We used 
the fast affine projection technique of order 8 for updating the filter coefficients. See 
Shimauchi et al., a reference cited above. Fig. 4 shows the misalignment in dB with time. 
The misalignment is defined as 

A ? A 

irw 11/Z 1LL2048 "" /Z 11 |I 2^ II/Z 12.1.2048 - /Z ^ II 2 
1U* lOg 10 2 2 

"^11. 1:2048 "2 + " \ 2 , 1:2048 ' ^ 

where the subscript 1:2048 is used to indicate that the first 2048 samples of the 

corresponding echo path responses have been used here. This figure corresponds to the 

set of data when the transmitting room echo path responses were kept fixed as already 

described. The dotted line corresponds to the case of original signal and the solid line to 

the case of modified data using our technique of time-varying all-pass filtering. 
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Since we have used 'real-world' collected data for the transmitted signals, the 
situation was not as bad as when simulated data was used. We did not experience sudden 
jumps, but misalignment settles down at around -14 dB whereas with our technique of 
signal decorrelation, the misalignment goes below -20 dB. 
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IN THE CLAIMS 

1. In a communications system having a plurality of microphones at a 
transmitting location transmitting over separate corresponding plurality of channels to 
corresponding speakers in a receiving location and a plurality of microphones at the 
receiving location coupled over corresponding plurality of channels to speakers at the 
transmitting location generating echo signals, a multi-channel acoustic cancellation 
system comprising: 

filter means coupled to output of said plurality of microphones at said transmitting 
location and input to said plurality of speakers at receiving location for providing 
estimated signals representing estimates of echo path responses from said plurality 
microphones from said receiving location to said plurality of speakers at said transmitting 
location; 

means coupled to input of said plurality of speakers at said transmitting location 
and output of said microphones at said receiving location for providing true signals 
representing true echo signal; 

means for subtracting said true signals from said estimated signals to reduce echo 
signals and to obtain coefficient control signals representing errors; 

means for coupling said coefficient control signals to said filter means to change 
the filter coefficients to minimize said errors; and 

means for providing decorrelation of said signals using all-pass filters in said 

channels having different time varying filtering. 
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2. The system of Claim 1 wherein the time varying parameter takes a 
bounded random walk. 

3. The system of Claim 2 where the bounds in the value are based on data for 
just noticeable time delay difference from psychoacoustics. 

4. The system of Claim 3 where the noticeable delay is between 30 and 200 
microseconds. 

5. The system of Claim I where the filter means include finite impulse 
response (FIR) filters that have filter coefficients updated adaptively depending on the 
input signals to the loudspeakers and outputs of the microphones. 

6. A multi-channel acoustic cancellation system comprising: 

filter means coupled to output of said plurality of microphones at a transmitting 
location and input to a plurality of speakers at receiving location for providing estimated 
signals representing estimates of echo path responses from said plurality microphones 
from said receiving location to said plurality of speakers at said transmitting location; 

means coupled to input of a plurality of speakers at said transmitting location and 
output of a plurality of microphones at said receiving location for providing true signals 
representing true echo signal; 

means for subtracting said true signals from said estimated signals to reduce echo 
signals and to obtain coefficient control signals representing errors; 
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means for coupling said coefficient control signals to said filter means to change 
the filter coefficients to minimize said errors; and 

means for providing decorrelation of said signals in said separate corresponding 
plurality of channels by providing an all-pass filter having different time varying filtering 
5 in each channel. 

7. The system of Claim 6 wherein the time varying parameter takes a 
bounded random walk. 

8. The system of Claim 7 where the bounds in the value are based on data for 
just noticeable time delay difference from psychoacoustics. 

10 9. The system of Claim 8 where the noticeable delay is between 30 and 200 

microseconds. 

10. A multi-channel acoustic cancellation system comprising: 

means coupled in said signal path between said transmitting location and 
said receiving location for reducing echo errors and means in said signal path for 
15 providing decorrelation of said signals in said separate corresponding plurality of 
channels by providing an all-pass filter having different time varying filtering in each 
channel. 

11. The system of Claim 10 wherein the time varying parameter takes a 
bounded random walk. 
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12. The system of Claim 1 1 where the bounds in the value are based on data 
for just noticeable time delay difference from psychoacoustics. 

13. The system of Claim 12 where the noticeable delay is between 30 and 200 
microseconds. 
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ABSTRACT 

A multi-channel acoustic cancellation system 40 with, for example, stereo 
speakers and a pair of microphones in the transmitting and receiving rooms (1 1 and 21) 
has time varying all-pass filters (45, 47) in the signal path between the microphones (13, 
15) in the transmitting room (1 1) and the speakers (27, 29) in the receiving room (21) to 
provide decorrelation. 
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